Goto

Collaborating Authors

 similarity assessment


Building Intelligent Databases through Similarity: Interaction of Logical and Qualitative Reasoning

Vilchis-Medina, José-Luis

arXiv.org Artificial Intelligence

In this article, we present a novel method for assessing the similarity of information within knowledge-bases using a logical point of view. This proposal introduces the concept of a similarity property space $\Xi$P for each knowledge K, offering a nuanced approach to understanding and quantifying similarity. By defining the similarity knowledge space $\Xi$K through its properties and incorporating similarity source information, the framework reinforces the idea that similarity is deeply rooted in the characteristics of the knowledge being compared. Inclusion of super-categories within the similarity knowledge space $\Xi$K allows for a hierarchical organization of knowledge, facilitating more sophisticated analysis and comparison. On the one hand, it provides a structured framework for organizing and understanding similarity. The existence of super-categories within this space further allows for hierarchical organization of knowledge, which can be particularly useful in complex domains. On the other hand, the finite nature of these categories might be restrictive in certain contexts, especially when dealing with evolving or highly nuanced forms of knowledge. Future research and applications of this framework focus on addressing its potential limitations, particularly in handling dynamic and highly specialized knowledge domains.


GPTDrawer: Enhancing Visual Synthesis through ChatGPT

Li, Kun, Chen, Xinwei, Song, Tianyou, Zhang, Hansong, Zhang, Wenzhe, Shan, Qing

arXiv.org Artificial Intelligence

In the burgeoning field of AI-driven image generation, the quest for precision and relevance in response to textual prompts remains paramount. This paper introduces GPTDrawer, an innovative pipeline that leverages the generative prowess of GPT-based models to enhance the visual synthesis process. Our methodology employs a novel algorithm that iteratively refines input prompts using keyword extraction, semantic analysis, and image-text congruence evaluation. By integrating ChatGPT for natural language processing and Stable Diffusion for image generation, GPTDrawer produces a batch of images that undergo successive refinement cycles, guided by cosine similarity metrics until a threshold of semantic alignment is attained. The results demonstrate a marked improvement in the fidelity of images generated in accordance with user-defined prompts, showcasing the system's ability to interpret and visualize complex semantic constructs. The implications of this work extend to various applications, from creative arts to design automation, setting a new benchmark for AI-assisted creative processes.


SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models

Yin, Chun, Chi, Tai-Shih, Tsao, Yu, Wang, Hsin-Min

arXiv.org Artificial Intelligence

Representations from pre-trained speech foundation models (SFMs) have shown impressive performance in many downstream tasks. However, the potential benefits of incorporating pre-trained SFM representations into speaker voice similarity assessment have not been thoroughly investigated. In this paper, we propose SVSNet+, a model that integrates pre-trained SFM representations to improve performance in assessing speaker voice similarity. Experimental results on the Voice Conversion Challenge 2018 and 2020 datasets show that SVSNet+ incorporating WavLM representations shows significant improvements compared to baseline models. In addition, while fine-tuning WavLM with a small dataset of the downstream task does not improve performance, using the same dataset to learn a weighted-sum representation of WavLM can substantially improve performance. Furthermore, when WavLM is replaced by other SFMs, SVSNet+ still outperforms the baseline models and exhibits strong generalization ability.


Advanced Detection of Source Code Clones via an Ensemble of Unsupervised Similarity Measures

Martinez-Gil, Jorge

arXiv.org Artificial Intelligence

The capability of accurately determining code similarity is crucial in many tasks related to software development. For example, it might be essential to identify code duplicates for performing software maintenance. This research introduces a novel ensemble learning approach for code similarity assessment, combining the strengths of multiple unsupervised similarity measures. The key idea is that the strengths of a diverse set of similarity measures can complement each other and mitigate individual weaknesses, leading to improved performance. Preliminary results show that while Transformers-based CodeBERT and its variant GraphCodeBERT are undoubtedly the best option in the presence of abundant training data, in the case of specific small datasets (up to 500 samples), our ensemble achieves similar results, without prejudice to the interpretability of the resulting solution, and with a much lower associated carbon footprint due to training. The source code of this novel approach can be downloaded from https://github.com/jorge-martinez-gil/ensemble-codesim.


Advancing multivariate time series similarity assessment: an integrated computational approach

Tonle, Franck, Tonnang, Henri, Ndadji, Milliam, Tchendji, Maurice, Nzeukou, Armand, Senagi, Kennedy, Niassy, Saliou

arXiv.org Artificial Intelligence

Data mining, particularly the analysis of multivariate time series data, plays a crucial role in extracting insights from complex systems and supporting informed decision-making across diverse domains. However, assessing the similarity of multivariate time series data presents several challenges, including dealing with large datasets, addressing temporal misalignments, and the need for efficient and comprehensive analytical frameworks. To address all these challenges, we propose a novel integrated computational approach known as Multivariate Time series Alignment and Similarity Assessment (MTASA). MTASA is built upon a hybrid methodology designed to optimize time series alignment, complemented by a multiprocessing engine that enhances the utilization of computational resources. This integrated approach comprises four key components, each addressing essential aspects of time series similarity assessment, thereby offering a comprehensive framework for analysis. MTASA is implemented as an open-source Python library with a user-friendly interface, making it accessible to researchers and practitioners. To evaluate the effectiveness of MTASA, we conducted an empirical study focused on assessing agroecosystem similarity using real-world environmental data. The results from this study highlight MTASA's superiority, achieving approximately 1.5 times greater accuracy and twice the speed compared to existing state-of-the-art integrated frameworks for multivariate time series similarity assessment. It is hoped that MTASA will significantly enhance the efficiency and accessibility of multivariate time series analysis, benefitting researchers and practitioners across various domains. Its capabilities in handling large datasets, addressing temporal misalignments, and delivering accurate results make MTASA a valuable tool for deriving insights and aiding decision-making processes in complex systems.


Learning Similarity Metrics for Volumetric Simulations with Multiscale CNNs

Kohl, Georg, Chen, Li-Wei, Thuerey, Nils

arXiv.org Artificial Intelligence

Simulations that produce three-dimensional data are ubiquitous in science, ranging from fluid flows to plasma physics. We propose a similarity model based on entropy, which allows for the creation of physically meaningful ground truth distances for the similarity assessment of scalar and vectorial data, produced from transport and motion-based simulations. Utilizing two data acquisition methods derived from this model, we create collections of fields from numerical PDE solvers and existing simulation data repositories. Furthermore, a multiscale CNN architecture that computes a volumetric similarity metric (VolSiM) is proposed. To the best of our knowledge this is the first learning method inherently designed to address the challenges arising for the similarity assessment of high-dimensional simulation data. Additionally, the tradeoff between a large batch size and an accurate correlation computation for correlation-based loss functions is investigated, and the metric's invariance with respect to rotation and scale operations is analyzed. Finally, the robustness and generalization of VolSiM is evaluated on a large range of test data, as well as a particularly challenging turbulence case study, that is close to potential real-world applications.


Multivariate Time-series Similarity Assessment via Unsupervised Representation Learning and Stratified Locality Sensitive Hashing: Application to Early Acute Hypotensive Episode Detection

Dhamala, Jwala, Azuh, Emmanuel, Al-Dujaili, Abdullah, Rubin, Jonathan, O'Reilly, Una-May

arXiv.org Artificial Intelligence

Timely prediction of clinically critical events in Intensive Care Unit (ICU) is important for improving care and survival rate. Most of the existing approaches are based on the application of various classification methods on explicitly extracted statistical features from vital signals. In this work, we propose to eliminate the high cost of engineering hand-crafted features from multivariate time-series of physiologic signals by learning their representation with a sequence-to-sequence auto-encoder. We then propose to hash the learned representations to enable signal similarity assessment for the prediction of critical events. We apply this methodological framework to predict Acute Hypotensive Episodes (AHE) on a large and diverse dataset of vital signal recordings. Experiments demonstrate the ability of the presented framework in accurately predicting an upcoming AHE.


Similarity Assessment through blocking and affordance assignment in Textual CBR

Prasath, R. Rajendra, Öztürk, Pinar

arXiv.org Artificial Intelligence

It has been conceived that children learn new objects through their affordances, that is, the actions that can be taken on them. We suggest that web pages also have affordances defined in terms of the users' information need they meet. An assumption of the proposed approach is that different parts of a text may not be equally important / relevant to a given query. Judgment on the relevance of a web document requires, therefore, a thorough look into its parts, rather than treating it as a monolithic content. We propose a method to extract and assign affordances to texts and then use these affordances to retrieve the corresponding web pages. The overall approach presented in the paper relies on case-based representations that bridge the queries to the affordances of web documents. We tested our method on the tourism domain and the results are promising.


Improving KD-Tree Based Retrieval for Attribute Dependent Generalized Cases

Bergmann, Ralph (University of Trier) | Tartakovski, Alexander (Piterion GmbH)

AAAI Conferences

Generalized cases are cases that cover a subspace rather than a point in the problem-solution space. Attribute dependent generalized cases are a subclass of generalized cases, which cause a high computational complexity during similarity assessment. We present a new approach for an efficient index-based retrieval of such generalized cases by an improved kd-tree approach. The experimental evaluation demonstrates a significant improvement in retrieval efficiency compared to previous methods.